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Abstract — This paper presents a novel pairwise constraint 
propagation approach by decomposing the challenging constraint 
propagation problem into a set of independent semi-supervised 
learning subproblems which can be solved in quadratic time 
using label propagation based on /c-nearest neighbor graphs. 
Considering that this time cost is proportional to the number of 
all possible pairwise constraints, our approach actually provides 
an efficient solution for exhaustively propagating pairwise con- 
straints throughout the entire dataset. The resulting exhaustive 
set of propagated pairwise constraints are further used to adjust 
the similarity matrix for constrained spectral clustering. Other 
than the traditional constraint propagation on single-source data, 
our approach is also extended to more challenging constraint 
propagation on multi-source data where each pairwise constraint 
is defined over a pair of data points from different sources. This 
multi-source constraint propagation has an important application 
to cross-modal multimedia retrieval. Extensive results have shown 
the superior performance of our approach. 

Index Terms — Pairwise constraint propagation, semi- 
supervised learning, constrained spectral clustering, multi-source 
data, cross-modal multimedia retrieval. 



I. Introduction 

In computer vision and multimedia content analysis, much 
effort has been made to handle different challenging problems 
by developing new machine learning techniques. Although 
encouraging results have been reported in the literature, these 
techniques suffer from severe performance degradation in 
practice, due to complicated data structures, inherent model- 
ing limitations and so on. Therefore, any extra supervisory 
information must be exploited to reduce such performance 
degradation and improve the quality of machine learning. The 
labels of data points are potential sources of such supervisory 
information which has been widely used. In this paper, we 
consider a commonly adopted and weaker type of supervisory 
information, called pairwise constraints which specify whether 
a pair of data points occur together. 

There exist two types of pairwise constraints, known as 
must-link constraints and cannot- link constraints, respectively. 
We can readily derive such pairwise constraints from the labels 
of data points, where a pair of data points with the same 
label denotes must-link constraint and cannot-link constraint 
otherwise. It should be noted, however, that the inverse may 
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horse, foal, flower, grass zebra, herd, field, tree horse, foal, grass, tree 
Must-link: (a, c) Cannot-link: (a, b) (b, c) 

Fig. 1. The must-link and cannot-link constraints derived from image 
annotations. Since we focus on recognizing the objects of interests in images, 
these pairwise constraints are formed without considering the backgrounds 
such as tree, grass, and field. 



not be true, i.e. in general we cannot infer the labels of data 
points from pairwise constraints, particularly for multi-class 
datasets. This implies that pairwise constraints are inherently 
weaker but more general than the labels of data points. More- 
over, pairwise constraints can also be automatically derived 
from domain knowledge [IJ, [2 J or through machine learning. 
For example, we can obtain pairwise constraints from the 
annotations of the images shown in Fig. [TJ Since we focus 
on recognizing the objects of interests (e.g. horse and zebra) 
in images, the pairwise constraints can be formed without 
considering the backgrounds such as tree, grass, and field. 
In practice, the objects of interest in images can be roughly 
distinguished from the backgrounds according to the ranking 
scores of annotations learnt automatically by an image search 
engine such as Google. 

Pairwise constraints have been widely used for many ma- 
chine learning problems such as constrained clustering ||T|-|@| 
and metric learning |5|-|7|, and it has been reported that the 
use of appropriate pairwise constraints can often lead to im- 
proved results. In this paper, for the convenience of clarifying 
our motivation, we focus on constrained spectral clustering, 
i.e., the exploitation of pairwise constraints for spectral clus- 
tering lISl- llTTl which constructs a new low-dimensional data 
representation for clustering using the leading eigenvectors 
of the similarity matrix. Since pairwise constraints specify 
whether a pair of data points occur together, they provide a 
source of information about the data relationships, which can 
be readily used to adjust the similarities between data points 
for spectral clustering. In fact, constrained spectral clustering 
has been extensively studied previously. For example, lfT2ll 
trivially adjusted the similarities between data points to 1 
and for must-link and cannot-link constraints, respectively. 



2 



This method only adjusts the similarities between constrained 
data points. In contrast, |TT| propagated pairwise constraints 
to other similarities between unconstrained data points using 
Gaussian process. However, as noted in |[T3]| , this method 
makes certain assumptions for constraint propagation specially 
with respect to two-class problems, although a time-consuming 
heuristic approach for multi-class problems is also discussed. 
Furthermore, such constraint propagation is also formulated 
as a semi-definite programming (SDP) problem in [14J. Al- 
though the method is not limited to two-class problems, it 
incurs extremely large computational cost for solving the SDP 
problem. In |15|, the pairwise constraint propagation is also 
formulated as a constrained optimization problem, but only 
must-link constraints can be used for optimization. 

To overcome these problems with constrained spectral 
clustering, we propose an exhaustive and efficient constraint 
propagation approach |T^, which is not limited to two-class 
problems or using only must- link constraints. Specifically, we 
decompose the challenging constraint propagation problem 
into a set of independent semi- supervised learning |17|- 
|[T9l subproblems. Through formulating these subproblems 
uniformly as minimizing a regularized energy functional, we 
can transform the pairwise constraint propagation into solving 
a continuous-time Lyapunov equation |20| which occurs in 
many branches of Control Theory such as optimal control and 
stability analysis f2V\. Considering that directly solving the 
Lyapunov equation scales polynomially to the data size, we 
further develop an approximate but efficient algorithm based 
on /c-nearest neighbor (/c-NN) graphs using label propagation 
introduced in ifTvl . Since the time complexity of this algorithm 
is quadratic with respective to the data size N and proportional 
to the total number of all possible pairwise constraints (i.e. 
N{N — l)/2), it can be considered computationally efficient. 
As compared to the SDP-based constraint propagation lfT4ll 
with a time complexity of 0{N^), this algorithm is noted to 
incur much less time cost. Finally, the resulting exhaustive 
set of propagated pairwise constraints can be used to adjust 
the similarity matrix for spectral clustering. Although our 
constraint propagation and similarity adjustment approaches 
are proposed in the context of constrained spectral clustering, 
they can be readily applied to many other machine learning 
problems initially provided with pairwise constraints. 

It should be noted that the aforementioned pairwise con- 
straint propagation is limited to single- source data. That is, 
each pairwise constraint is defined over a pair of data points 
from the same source. In this paper, we make further attempt 
to handle more challenging constraint propagation on multi- 
source data where each pairwise constraint is defined over a 
pair of data points from different sources. In this case, pairwise 
constraints still specify whether a pair of data points occur to- 
gether and the goal of constraint propagation remains the same 
(i.e. to propagate the initial pairwise constraints throughout the 
entire dataset). The main difficulty of multi-source constraint 
propagation lies in how to propagate pairwise constraints 
across different sources. Fortunately, this challenging problem 
can be readily decomposed into a series of two-source con- 
straint propagation subproblems. More importantly, such two- 
source constraint propagation can be formulated as minimizing 



a regularized energy functional in a semi- supervised learning 
perspective, similar to constraint propagation on single-source 
data. Finally, we succeed in developing a similar efficient algo- 
rithm for multi-source constraint propagation. When multiple 
sources refers to text, image, audio and so on, the output of 
our multi- source constraint propagation actually denotes the 
correlation between different media sources. That is, our multi- 
source constraint propagation can be directly used for cross- 
modal multimedia retrieval which has drawn much attention 
recently [22\. For cross-modal retrieval, it is not a feasible 
solution to combine multiple modalities as previous multi- 
modal multimedia retrieval methods ll23l - ll25l did. 

In summary, our exhaustive and efficient constraint propa- 
gation approach can be seen as a very general technique and 
has the following advantages: 

• This is the first attempt to clearly show how pairwise 
constraints are propagated throughout the entire dataset 
in a semi-supervised learning perspective. Moreover, the 
pairwise constraint propagation is first shown to equal to 
solving a Lyapunov equation. 

• Different from many previous methods, our approach is 
not limited to two-class problems or using only must-link 
constraints. Moreover, it allows soft constraints |26|. 

• Although developed in the context of constrained spectral 
clustering, our approach has the potential to improve the 
performance of many other machine learning techniques. 

• When extended to more challenging multi-source con- 
straint propagation, our approach can achieve promising 
results in the application of cross-modal retrieval. 

Moreover, upon our short conference version |16|, this pa- 
per presents three additional contributions: more insightful 
interpretation of pairwise constraint propagation in a semi- 
supervised learning perspective (see Section III-Ab . more ex- 
planation of our motivation of adjusting the similarity matrix 
using the propagated pairwise constraints (see Proposition 
O, and nontrivial extension to more challenging multi- source 
constraint propagation (see Section HlB. 

The remainder of this paper is organized as follows. In 
Section [III we propose an exhaustive and efficient constraint 
propagation approach. In Section Hill our approach is extended 
to more challenging multi- source constraint propagation. In 
Section [iVl we present the experimental results to evaluate 
our approach. Finally, Section |Vl gives the conclusions. 

II. Exhaustive and Efficient Constraint 
Propagation 

This section presents our exhaustive and efficient constraint 
propagation in detail. We first give our solution to the chal- 
lenging constraint propagation problem in a semi- supervised 
learning perspective, and then propose an approximate but 
efficient algorithm. Finally, we apply the proposed constraint 
propagation algorithm to constrained spectral clustering. 

A. Problem and Solution 

Given a dataset A' = {xi,...,XAr},we denote a set of initial 
must-link constraints as = {{xi^xj) : k = Ij} and a set 
of initial cannot-link constraints as C = {{xi^Xj) : U ^ Ij}, 
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Fig. 2. Illustration of the matrix Z. When we focus on a single data point (e.g. 
X3 here), the pairwise constraint propagation can be viewed as a two-class 
semi-supervised learning problem in both vertical and horizontal directions. 

where k is the label of data point Xi . As we have mentioned, 
the two sets of initial pairwise constraints can be directly used 
to adjust the similarities between data points. In previous work 
[121, only the similarities between the constrained data points 
are adjusted, and thus the initial pairwise constraints exert very 
limited effect on the total similarity adjustment. In this paper, 
we make attempt to spread the effect of pairwise constraints 
throughout the entire dataset, thereby enabling the initial 
pairwise constraints to exert a stronger influence on the total 
similarity adjustment. For the convenience of description, the 
exhaustive set of propagated pairwise constraints is denoted 
as F G J^, where T = {F = {fij}NxN - \fij \ < !}• It should 
be noted that fij > means {xi , Xj ) is a must-link constraint 
while fij < means (xi^xj) is a cannot-link constraint, 
with \fij\ denoting the confidence score of {xi^Xj) being a 
must- link (or cannot-link) constraint. In the following, we will 
develop a novel pairwise constraint propagation method to find 
the best solution F* e T based on the two sets of initial 
pairwise constraints M and C. 

A main obstacle of pairwise constraint propagation lies in 
that the initial cannot-link constraints are not transitive, partic- 
ularly for multi-class problems. In this paper, however, we suc- 
ceed in propagating both must-link and cannot-link constraints 
throughout the entire dataset. Similar to the representation 
of the exhaustive set of propagated pairwise constraints, we 
denote the two sets of initial pairwise constraints M and C 
with a single matrix Z = {^IatxAt- 

{+1, {xi.Xj) e M; 
-1, {xi,Xj)eC; (1) 
0, otherwise. 

The above definition is inherently suitable for multi-class 
problems. We have \zij\ < 1 for soft constraints |26|. This 
means that Z G J-*. Since we can directly infer the initial 
pairwise constraints from Z, the initial pairwise constraints 
have been represented using Z without loss of information. 
An example of Z is illustrated in Fig. [2l 

We make further observations on Z column by column. It 
can be observed that the j-th column Zj actually provides the 
initial configuration of a two-class semi-supervised learning 
problem with respect to xj (see Fig. O, where the "positive 
class" contains the data points that must appear together with 
Xj and the "negative class" contains the data points that cannot 



appear together with Xj. More concretely, Xi can be initially 
regarded as coming from the positive (or negative) class if 
Zij > (or < 0), but if Xi and xj are not constrained 
thus Xi is initially unlabeled. This configuration of a two- 
class semi- supervised learning problem is also suitable for 
soft constraints. According to flVl, CBl, the semi- supervised 
learning problem with respect to Xj in vertical direction can 
be formulated as minimizing a regularized energy functional. 

Given the dataset X, we define an undirected weighted 
graph G = (V^W) with its vertex set V = X and weight 
matrix W = {wij}NxN, where Wij is the weight of the edge 
between Xi and Xj. The weight matrix W is assumed to be 
nonnegative and symmetric. The normalized graph Laplacian 
of G is given by 

L = I-D-^/^WD-^/^ (2) 

where I is an x identity matrix and D is an A^ x A" 
diagonal matrix with its i-th diagonal element being Wij . 
Based on this normalized graph Laplacian L, the pairwise 
constraint propagation with respect to Xj in vertical direction 
(see Fig. |2]) is formulated as 

mini||F.,-Z.,||i + |F;5LF.,-, (3) 

where /i > is a regularization parameter and Fj (or 
Z.j) is the j-th column of F (or Z). The first term of the 
above equation denotes the fitting error, which penalizes large 
changes between the propagated pairwise constraints and the 
initial ones. The second term denotes an energy functional 
(also known as the smoothness measure) defined based on the 
graph G, which penalizes large changes between nearby data 
points. More details of the definition of this energy functional 
can be found in |18|. 

Since the other columns of Z can be handled similarly, we 
can decompose the pairwise constraint propagation problem 
in vertical direction into A' independent label propagation 
subproblems which can then be solved in parallel. By merging 
all of these subproblems into a single optimization problem, 
the vertical constraint propagation is formulated as: 

mmi||F-Z|||. + ftr(F^LF), (4) 

where tr(-) stands for the trace of a matrix. That is, similar 
to [Til, flSh we have formulated the vertical constraint 
propagation as minimizing a regularized energy functional. 

However, it is also possible that a column of Z contains 
no pairwise constraints (for example, see the fifth column in 
Fig.O. That is, the entries of this column may all be zeros, and 
for such cases, there is no vertical constraint propagation along 
this column. We deal with this problem through horizontal 
constraint propagation, which is performed by considering Z 
row by row, instead of column- wise. When we focus on a 
single data point Xj , the pairwise constraint propagation with 
respect to xj in horizontal direction (see Fig. O can also be 
viewed as a two-class semi- supervised learning problem and 
then be formulated similar to equation ([3]): 

mini||F,.-Z,-.||l + |F,.LFj, (5) 
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where /i > is a regularization parameter and Fj. (or Zj.) 
is the j-th row of F (or Z). In fact, the above optimization 
problem is equivalent to 




which takes the same form as equation ©. If all the TV rows 
are considered together, we can obtain the following horizontal 
constraint propagation: 

mmi||F-Z||l. + |tr(FLF^). (7) 

Finally, the vertical and horizontal constraint propagation 
can be combined by: 

min ||F - Z||^ + Jtr(F^LF + FLF^). (8) 

F Z 

The distinct advantage of such combination is that we can 
propagate the initial pairwise constraints to any pair of data 
points by considering the two directions simultaneously. That 
is, the constraint propagation may not break down even if 
the pairwise constraints are missing for certain data points. 
Let Q(F) denote the objective function in equation ([8]). 
Differentiating Q(F) with respect to F and setting it to zero, 
we have the following equation: 

^ = 2(F - Z) + /iLF + /iFL = 0, (9) 

which can be transformed into a symmetric form 

(I + /iL)F + F(I + /iL) = 2Z. (10) 

In fact, the above equation is a standard continuous-time 
Lyapunov matrix equation |20| and is used in various areas 
of Control Theory such as optimal control and stability anal- 
ysis 1211 . According to Proposition [TJ the Lyapunov matrix 
equation (fTOl) has a unique and symmetric solution. 

Proposition 1: The Lyapunov matrix equation (fTOl) has a 
unique and symmetric solution. 

Proof: Since the graph Laplacian L is nonnegative 
semidefinite and /i > 0, the matrix I + /iL is positive definite 
and all of its eigenvalues are positive. Hence, for any pair 
of eigenvalues of I + /iL, i.e. and Aj, A^ + \j ^ 0. 
According to the Lyapunov matrix equation ([TOb has a 
unique solution. Moreover, since both L and Z are symmetric, 
we can get ((I + /iL)F)^ + (F(I + /iL))^ = 2Z^, i.e. 
F^(I + /iL) + (I + /iL)F^ = 2Z. This means that F^ is 
the solution of the Lyapunov matrix equation (fTOl) if F is. 
However, we have proven that this equation has a unique 
solution. Hence, F^ = F if F is the solution. ■ 

In summary, by formulating the pairwise constraint propaga- 
tion as minimizing a regularized energy functional in a semi- 
supervised learning perspective, we have, for the first time, 
shown that pairwise constraint propagation actually equals to 
solving a Lyapunov equation which has been widely used 
in Control Theory. That is, we have given rise to interest- 
ing insight that links pairwise constraint propagation to the 
Lyapunov equation. However, directly solving the Lyapunov 
equation has a polynomial time complexity with respect to 
the data size, although many numerical methods have been 



developed in the literature. In the following, we will propose 
an approximate but efficient algorithm, instead of directly 
solving the Lyapunov equation. 

B. The Proposed Algorithm 

Although the pairwise constraint propagation problem ([5]) 
can be handled by solving the Lyapunov matrix equation 
(fTOl) , it leads to a large computational cost. To develop 
an efficient algorithm, we approximately solve the pairwise 
constraint propagation problem ([S]) in two optimization steps: 
(1) F* = argminp^llF - Z||^ + ftr(F^LF); (2) F* = 
arg minp ^ 1 1 F - F^ 1 + f tr (FLF^ ) . That is, we first perform 
the vertical constraint propagation and then the horizontal 
constraint propagation. Based on /c-nearest neighbor (/c-NN) 
graphs, both vertical and horizontal constraint propagation can 
be solved efficiently using the label propagation technique 
introduced in |17|. Strictly speaking, this optimization strat- 
egy may only find a suboptimal solution for our pairwise 
constraint propagation given by equation ([5]). Fortunately, our 
later experimental results have demonstrated that the solution 
obtained by this optimization strategy is comparable to that of 
the Lyapunov matrix equation. 

As we have mentioned, a candidate set of propagated 
pairwise constraints can be denoted as F G J^, where 
^ = {F = {fij}NxN : \fij\ < 1}. Particularly, Z G ^, 
where Z collects the initial pairwise constraints. To find the 
best solution F* G based on Z, we propose the following 
approximate algorithm using the above optimization strategy: 

( 1 ) Construct a /c-NN graph by defining its weight matrix 

W {wij}NxN as: Wij = , ^(^^^^i) jf ^ 

(j ^ i) is among the /c-nearest neighbors of Xi and 
Wij = otherwise, where A = {a{xi^Xj)}NxN 
is the kernel matrix defined on the dataset X. Set 
W = (W + W^)/2 to ensure W is symmetric. 

(2) Compute the matrix L = D"^/^WD"^/^, where D 
is a diagonal matrix with its i-th diagonal element 
being Wij. 

(3) Iterate F^(t + 1) = aLFy{t) + (1 - a) Z for 
the vertical constraint propagation until convergence, 
where F^ {t) G T and a is a parameter in the range 
(0,1). 

(4) Iterate F/,(t + 1) = aFh{t)L + (1 - a)F* for the 
horizontal constraint propagation until convergence, 
where F^(t) G and F* is the limit of {Fy{t)}. 

(5) Output F* = F^ as the final representation of the 
propagated pairwise constraints, where F^ is the 
limit of {Fh{t)}. 

Below we give a convergence analysis of the above con- 
straint propagation algorithm. Since the vertical constraint 
propagation in Step (3) can be regarded as label propagation, 
its convergence has been shown in 1 17 |. More concretely, sim- 
ilar to (13, we have F* = (1 -a)(I-aL)-^Z as the limit of 
{Fy{t)}. Meanwhile, we can directly obtain an analytical solu- 
tion F* = (I + /iL)~^Z for the vertical constraint propagation 
F* = arg minF ^ ||F - Z|||. + f tr(F^LF). Since L = 1 - L, 
the analytical solution equals to F* = ((/i + l)I — /iL)~^Z, 



(a) 



(b) 




Fig. 3. Illustration of the propagated pairwise constraints obtained by our algorithm: (a) four pairwise constraints and ideal clustering of the dataset; (b) 
final constraints propagated from only two must-link constraints; (c) final constraints propagated from only two cannot-link constraints; (d) final constraints 
propagated from four pairwise constraints. Here, must-link constraints are denoted by solid red lines, while c annot-link constraints are denoted by dashed 
blue lines. Moreover, we only show the propagated constraints with predicted confidence scores > 0.1 in Figs. |3(b)|3(d)| 



which means that a = + 1). As for the horizontal 

constraint propagation, we have 

F^(t + 1) = aL^Flit) ^ {1 - a)Ff 

= aLFl{t)^{l-a)Ff. (11) 

That is, the horizontal propagation in Step (4) can be trans- 
formed into a vertical propagation which converges to F^^ = 
(1 — a)(I — aL)~^F*^. Hence, our pairwise constraint prop- 
agation algorithm has a closed-form solution as follows: 

F* = F* =(l-a)F:(I-aL^)-i 

= (1 -a)^(I-aL)-^Z(I-aL)-\ (12) 

which actually accumulates the evidence to reconcile the 
contradictory propagated constraints for certain pairs of data 
points. As a toy example, the propagated pairwise constraints 
given by equation ([T2l) are explicitly shown in Fig. [S] We can 
find that the propagated pairwise constraints obtained by our 
algorithm are consistent with the ideal clustering of the dataset. 

The above closed-form solution can be further discussed in 
detail. Firstly, given that both L and Z are symmetric, this 
solution is symmetric, just as the solution of the Lyapunov 
matrix equation (see Proposition [T]). Secondly, similar to the 
above convergence analysis, we can readily obtain the same 
solution ([T2I) . if we first perform the horizontal constraint 
propagation and then the vertical constraint propagation. To 
summarize, we have the following proposition: 

Proposition 2: (i) The closed-form solution ([T2I) is a fea- 
sible solution of the Lyapunov matrix equation ([TOb : (ii) 



The pairwise constraint propagation in two directions can be 
alternately performed, no matter which is first. 
Although the closed-form solution is only proven to be a 
feasible solution, our later results have shown that it is 
comparable to the solution of the Lyapunov matrix equation. 

Finally, we give a complexity analysis of our pairwise 
constraint propagation algorithm. Through semi- supervised 
learning based on /c-NN graphs (k <C A^), both vertical 
and horizontal constraint propagation can be performed in 
quadratic time 0{kN'^). Since this time complexity is propor- 
tional to the total number of all possible pairwise constraints 
(i.e. N{N — l)/2), our algorithm can be considered computa- 
tionally efficient. Moreover, our algorithm incurs significantly 
less computational cost than [T4], given that pairwise con- 
straint propagation based on semi-definite programming has a 
time complexity of 0{N^). 

C. Application to Constrained Spectral Clustering 

It should be noted that the output F* = {f^j}NxN of our 
constraint propagation algorithm represents an exhaustive set 
of pairwise constraints with the associated confidence scores 
|F*|. Although this exhaustive set of propagated pairwise 
constraints can be used for many machine learning problems, 
we focus on constrained spectral clustering whose goal is 
to obtain a data partition that is fully consistent with the 
propagated pairwise constraints. More concretely, F* can be 
exploited for constrained spectral clustering by adjusting the 
original normalized weight matrix W (i.e. < Wij < 1) of 



(a) 



(b) 



Fig. 4. The results of constrained clustering on the toy dataset using four pairwise constraints given by Fig. |3(a)| (a) spectral learning fTT\; (b) our approach. 
The clustering obtained by our approach is consistent with the ideal clustering. 



the graph constructed for spectral clustering: 



1-(1 -/*.)(! /*. >0; 



- (13) 

/*. <0. 



The distinct advantage of this weight (or similarity) adjustment 
strategy is that it can be readily applied to many other machine 
learning problems provided with pairwise constraints initially. 

In the following, the new matrix W = {wij}NxN will be 
directly used for spectral clustering. Here, we need to first 
prove that W can be viewed as a new normalized weight 
matrix by showing that it has the following nice properties. 

Proposition 3: (i) W is nonnegative and symmetric; (ii) 
Wij e [0, 1]; (iii) Wij > Wij (or < Wij) if /*• > (or < 0); 

(or < 0). 

Proof: The above proposition is proven as follows: 

(i) The symmetry of both W and F* ensures that W is 
symmetric. Since < Wij < 1 and |/*^| < 1, we also 
have: w,j = l-(l-/*.)(l-^*j) > l-{l-w^j) > 
if /*. > and = (1 + /*.)^., > if /*. < 
0. That is, we always have Wij > 0. Hence, W is 
nonnegative and symmetric. 

(ii) We have proven that Wij > when < Wij < 1 and 
1/^* I < 1. Similarly, we can prove that Wij < 1. 

(iii) According to equation ([T3l) , Wij can be viewed as 
a monotonically increasing function of /*^ . Since 
Wij = Wij when /*^- = 0, we thus have: Wij > Wij 
(or<^,,)if/*. >0(or<0). 

(iv) ISj = 1 - /5 if /5 > 0' while = 1 + /*. if 
/*. <0. In other words, = 1 - |/*. |. 

(v) If f*j > 0, Wij can be viewed as a monotonically 
increasing function of Wij . Since Wij = /^* when 
Wij = 0, we have: Wij > f^j if Wij > 0. Moreover, 
if /•* < 0, it can be followed from Wij < 1 and 
1 + /*. > that w.j = (1 + /*.)^^. < 1 + /*•• 

■ 

The properties (i) and (ii) show that W can be used as a 
normalized weight matrix for spectral clustering. Furthermore, 
the property (iii) shows that the new weight matrix W 
is actually derived from the original weight matrix W by 
increasing Wij for the must-link constraints with f*j > and 



decreasing Wij for the cannot-link constraints with /•* < 0. 
This is consistent with our original motivation of exploiting 
pairwise constraints for spectral clustering. Although there 
may exist other weight adjustment methods that also satisfy 
the property (iii), our approach has two distinct advantages 
over them given by the properties (iv) and (v), respectively. 

More concretely, the property (iv) in Proposition [3] shows 
the first distinct advantage of our weight adjustment approach, 
i.e., must- link and cannot-link constraints play the same im- 
portant role (with the same derivatives) in weight adjustment 
if they have the same absolute confidence scores, which is 
reasonable given no prior knowledge. In contrast, although 
the simple choice by directly setting Wij = (1 + fij)wij for 
any /•* also has the property (iii), it puts more importance 
on must- link constraints than on cannot-link constraints (i.e. 
1 + > ^ ~ \ fij\) ^^^^ if they have the same absolute 
confidence scores. As compared with this simple choice, the 
second advantage of our weight adjustment approach is that 
Wij can be ensured to be large when /•* takes a large positive 
value according to the property (v), even if Wij is initially 
very small. This is indeed a good thing that only our weight 
adjustment approach has. Here, it should be noted that both 
of the two weight adjustment methods can ensure that Wij is 
small when /•* takes a large negative value according to the 
property (v), even if Wij is initially very large. 

After we have successfully incorporated the exhaustive set 
of propagated pairwise constraints obtained by our pairwise 
constraint propagation algorithm into a new weight matrix 
W, we then perform spectral clustering with this new weight 
matrix. The corresponding constrained spectral clustering al- 
gorithm is summarized as follows: 

(1) Find K largest nontrivial eigenvectors vi, vk of 
D-i/^WD-i/^ where D is a diagonal matrix with 
its i-th diagonal element being Wij . 

(2) Form E = [vi, v^], and normalize each row of 
E to have unit length. Here, the i-th row E^. is the 
low-dimensional feature vector for data point Xi. 

(3) Perform /c-means clustering on the new feature vec- 
tors {E^. : i = 1, N} to obtain K clusters. 

The clustering results on the toy dataset (see Fig. |3(a)| ) by 
the above algorithm are shown in Fig. |4(b)[ We can find that 
the clustering obtained by our constraint propagation approach 
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Samuel Adams (September 27 [O.S. September 16] 1722 - 
October 2, 1803) was an American statesman, political 
philosopher, and one of the Founding Fathers of the United 
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Fig. 5. Illustration of two examples of Wikipedia articles. Each section of a 
Wikipedia article is associated with a corresponding image. The images and 
text denote two different data sources. 



is consistent with the ideal clustering of the dataset, while this 
is not true for spectral learning 1 12) without using constraint 
propagation (see Fig. |4(a)| ). In the following, since the pairwise 
constraints used for constrained spectral clustering (CSC) is 
obtained by our exhaustive and efficient constraint propagation 
(E^CP), the above clustering algorithm is also denoted as 
E^CP to distinguish it from other CSC algorithms. 

III. Multi-Source Constraint Propagation 

In this section, our E^CP approach is extended to more 
challenging problem of pairwise constraint propagation on 
multi- source data. We first give our solution to such multi- 
source constraint propagation, and then propose an efficient 
algorithm. Finally, the proposed algorithm is applied to cross- 
modal multimedia retrieval. 

A. Problem and Solution 

We have provided a sound solution to the challenging 
problem of pairwise constraint propagation in the last section. 
However, this solution is limited to single- source data. That is, 
each pairwise constraint is defined over a pair of data points 
from the same source. In this paper, we further consider even 
more challenging constraint propagation on multi- source data 
where each pairwise constraint is defined over a pair of data 
points from different sources. In this case, pairwise constraints 
still specify whether a pair of data points occur together and 
the goal of constraint propagation remains the same (i.e. to 
propagate the initial pairwise constraints throughout the entire 
dataset). The main difficulty of multi- source constraint prop- 
agation lies in how to propagate pairwise constraints across 
different data sources. Since this challenging problem can 
be readily decomposed into a series of two-source constraint 
propagation subproblems, we focus on two- source constraint 
propagation and formulate it in detail as follows. 

Let {A', 3^} be a two-source dataset, where X = 
{xi,...,XAr} and y = {^i,...,^m}. It should be noted that 
X may have a different data size from y (i.e. 7^ M). As 
an example, a two-source dataset can be generated with the 
Wikipedia articles (see Fig. [5]), with images and text being the 
two data sources. For the two- source dataset {A', 3^}, we can 
define a set of initial must-link constraints as = { (x^ , i/j ) : 
l{xi) = l{yj)} and a set of initial cannot-link constraints 
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Fig. 6. Illustration of several examples of pairwise constraints defined on a 
two-source dataset. Both must-link and cannot-link constraints are generated 
based on the correspondence between images and text shown in Fig. \5\ 



as C = {{xi.yj) : l{xi) 7^ KVj)}, where l{xi) (or l{yj)) 
is the label of Xi ^ X (or yj G y). Here, Xi and yj are 
assumed to share the same label set. If the class labels are not 
provided, the pairwise constraints can be defined only based 
on the correspondence between two data sources, which can be 
readily obtained from Web-based content such as Wikipedia 
articles. Several examples of pairwise constraints defined in 
this way are illustrated in Fig. [6l 

As we have mentioned, the goal of two- source constraint 
propagation is to propagate the two sets of initial pairwise 
constraints M and C across the two sources X and y. In fact, 
this equals to finding the best solution F* G based on M 
and C with = {F = {fij}NxM • \fij \ < !}• Here, it should 
be noted that any exhaustive set of pairwise constraints can be 
denoted as F G J^, where fij > means (xi^yj) is a must- 
link constraint while fij < means (xi^yj) is a cannot-link 
constraint, with | fij \ denoting the confidence score of {xi , yj ) 
being a must- link (or cannot-link) constraint. In the following, 
T is viewed as the feasible solution set. 

Similar to the representation of initial pairwise constraints 
on single- source data given by equation (1), the two sets of 
initial pairwise constraints M and C on the two- source dataset 
{A', 3^} can be denoted with a single matrix Z = {zij}NxM- 



+1, {xi,yj)eM; 
-1, {xi,yj)eC; 
0, otherwise. 



(14) 



We can find that Z e J^. Now it can be stated that the goal 
of the two- source constraint propagation is to find the best 
solution F* G J" based on Z. 

Although the two- source constraint propagation problem 
is less complicated than the original multi-source constraint 
propagation problem, the task of finding the best solution 
F* G based on Z is still rather difficult. Fortunately, by 
making vertical and horizontal observations on Z, the two- 
source constraint propagation problem can be decomposed into 
semi- supervised learning subproblems, just as our interpreta- 
tion of pairwise constraint propagation on single- source data 
in a semi- supervised learning perspective. Furthermore, these 
semi- supervised learning subproblems can be similarly merged 
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to a single optimization problem as follows: 

mm ||F - Z\\l + ^tr(F^L;,F) + ^tr(FLj;F^), (15) 

where jj^x (or jj^y) denotes the regularization parameter for 
X (or y), and Lx (or L3;) denotes the normalized Laplacian 
matrix defined on X (or y) according to equation (2). The 
second and third terms of the above equation denote the energy 
functional |18| (also known as the smoothness measure) 
defined over X and y, respectively. When X = y, the above 
two- source constraint propagation degrades to the traditional 
constraint propagation on single- source data given by equation 
([8]). In summary, we have successfully formulated both single- 
source and multi- source constraint propagation as minimizing 
a regularized energy functional. 

Let Q(F) denote the objective function in equation ilSh . 
Differentiating Q(F) with respect to F and setting it to zero, 
we have the following equation: 

dO 

^ = 2(F - Z) + fixi^xF + /ij^FL^; = 0. (16) 

Hence, the two-source constraint propagation equals to solving 
a Sylvester matrix equation |20|: 

(I + ^xI^x)F + F(I + fiyLy) = 2Z. (17) 

The above Sylvester matrix equation can be viewed as a 
generalization of the Lyapunov matrix equation (fTOb . Since 
both I + jJ^x^x and I + j-^y^iy are positive definite, this 
Sylvester matrix equation has a unique solution according 
to 1271 . A classical algorithm for the numerical solution of 
the Sylvester equation has been proposed in |20|. However, 
this algorithm incurs a large time cost. In the following, we 
will propose an approximate but efficient algorithm, instead 
of directly solving the Sylvester matrix equation. 

B. The Proposed Algorithm 

Considering the strategy of solving the pairwise constraint 
propagation problem ([8]) on single- source data, we can sim- 
ilarly handle the two- source constraint propagation problem 
([T5l) in two optimization steps: (1) F^ = argminp ^||F — 
Z\\l + ^tr(F^L;,F); (2) F* = argminp i||F - F^||^ + 
^tr(FL^F^). That is, the pairwise constraint propagation is 
first performed on X and then on y. More notably, based on 
/c-NN graphs, these two optimization problems can be solved 
efficiently using the label propagation technique fTl]. 

Let Wx (or W3;) denote the weight matrix of the /c-NN 
graph constructed on X (or y) just as Step (1) of the algorithm 
proposed in Section III-BI The approximate algorithm for two- 
source constraint propagation can be summarized as follows: 

— 1/2 1/2 

(1) Compute two matrices Lx = ^x W^'D^^ and 

Ly = 'Dy^^'^Wy'Dy^^'^ , where T>x (or T>y) is a 
diagonal matrix with its i-th diagonal element being 
the sum of the i-th row of W^' (or Wy). 

(2) Iterate Fx{t + 1) = axtx^xit) + (1 - ax)Z for 
the pairwise constraint propagation on X until con- 
vergence, where Fx{t) G T and ax is 3. parameter 
in the range (0, 1). 



(3) Iterate Fy{t + 1) = ayFy{t)Ly + (1 - ay)F^;^ 
for the pairwise constraint propagation on y until 
convergence, where Fy{t) G F^ is the limit of 
{Fx{t)}, and ay is a parameter in the range (0, 1). 

(4) Output F* = F^ as the final representation of the 
propagated pairwise constraints, where F^ is the 
limit of {Fy{t)}. 

Similar to our analysis of the algorithm proposed in Sec- 
tion III-BI we can readily prove that the above algorithm for 
two- source constraint propagation converges to: 

F* = (1 - ax){l - ay){l - axLx)-^Z{l - ayLy)-\ (18) 

where ax = j^x/il^x + 1) and ay = jJ^y/i/J^y + 1). If 
we first perform pairwise constraint propagation on y and 
then on A', the same solution can be obtained. That is, the 
pairwise constraint propagation can be performed on the two 
data sources alternately, no matter which is first (similar to 
Proposition [2ii)). Moreover, we find that the above algorithm 
for two- source constraint propagation has a time complexity 
of 0(kNM) which is proportional to the total number of all 
possible pairwise constraints. Hence, this algorithm can be 
considered to provide an efficient solution. 

Although the above constraint propagation algorithm is 
limited to two- source data, it can be readily applied to 
multi-source data. For example, given a three-source dataset 
{X^y^Z}, this constraint propagation algorithm can be per- 
formed on three two-source datasets {A', 3^}, {X^Z}, and 
{y, Z}, respectively. The obtained three groups of results can 
thus be used for the retrieval with a query from a single source 
or a pair of queries from different sources. 

C. Application to Cross-Modal Multimedia Retrieval 

When multiple sources refers to text, image, audio and 
so on, the output of our multi-source constraint propagation 
actually can be viewed as the correlation between different 
media sources. As we have mentioned, given the output F* = 
{fij}NxM of two-source constraint propagation, (xi^yj) de- 
notes a must-link constraint if /*^ > 0, while (xi^yj) denotes 
a cannot-link constraint if /•* < 0. Considering the inherent 
meaning of must-link and cannot-link constraints, we can state 
that: Xi and yj are "positively correlated" if /^* > 0, while they 
are "negatively correlated" if /*^ < 0. That is, we can view /*^ 
as the correlation coefficient between Xi and yj . The distinct 
advantage of such interpretation of F* as a correlation measure 
is that F* can thus be used for ranking on y given a query 
Xi or ranking on X given a query yj . In fact, this is the goal 
of cross-modal multimedia retrieval which has drawn much 
attention recently [22J. That is, such challenging problem can 
be directly solved by our multi-source constraint propagation. 
It should be noted that, for cross-modal retrieval, it is not a 
feasible solution to combine multiple modalities as previous 
multi-modal retrieval methods 1231 - 11251 did. 

In this paper, we focus on a special case of cross-modal 
retrieval, i.e. only text and image modalities are considered. 
Although various image annotation systems f28l-|[30| have 
been developed to automatically extract semantic descriptors 
from images, they rely on very limited types of textual 



representation. Images are simply associated with keywords 
or class labels, without explicitly modeling of free-form text. 
In contrast, cross-modal retrieval is designed to deal with 
much more richly annotated data, motivated by the ongoing 
explosion of Web-based multimedia content such as news 
archives and Wikipedia pages (see Fig. O. In these cases, 
images are related to complete text articles, and the corre- 
spondence between image and text modalities is much less 
direct than that provided by light annotation. At this point, 
cross-modal retrieval is more difficult than image annotation. 
To our best knowledge, little attempt has been made to directly 
learn the correlation between images and free-form text. One 
exception is the notable work of [22J, where two hypotheses 
have been investigated for cross-modal retrieval: 1) there is 
a benefit to explicitly modeling the correlation between text 
and image modalities, and 2) this modeling is more effective 
with higher levels of abstraction. More concretely, the corre- 
lation between the two modalities is learned with canonical 
correlation analysis (CCA) [31 1 and abstraction is achieved 
by representing text and image at a more general semantic 
level. However, two separate steps, i.e. correlation analysis 
and semantic abstraction, are involved in this modeling, and 
the use of abstraction after CCA seems rather ad hoc. 

Fortunately, this problem can be completely addressed by 
our multi- source constraint propagation. The semantic infor- 
mation (e.g. class labels) associated with images and text 
can be used to define the initial must-link and cannot-link 
constraints based on the training dataset, while the correlation 
between text and image modalities can be explicitly learnt by 
the proposed algorithm in Section IIII-BI That is, the correla- 
tion analysis and semantic abstraction has been successfully 
integrated in a unified constraint propagation framework. Our 
later experimental results have shown the effectiveness of such 
integration as compared to 1221 . 

IV. Experimental Results 

In this section, the proposed constraint propagation algo- 
rithms are evaluated in two applications: constrained spectral 
clustering and cross-modal multimedia retrieval. The con- 
straint propagation results can be directly used for cross- 
modal retrieval, but can only be indirectly used for constrained 
spectral clustering with an extra step of similarity adjustment. 

A. Constrained Spectral Clustering 

We first describe the experimental setup for constrained 
spectral clustering, including the clustering evaluation measure 
and the graph construction approach. Moreover, we compare 
our algorithm with other closely related methods on image and 
UCI datasets, respectively. 

1) Experimental Setup: For comparison, we present the 
results of affinity propagation (AP) ifTSl . spectral learning 
(SL) [12J, and semi- supervised kernel k-means (SSKK) [4J, 
which are three constrained clustering algorithms closely 
related to our E^CP. Here, SL and SSKK adjust only the 
similarities between the constrained data points, while AP 
and our E^CP propagate the pairwise constraints throughout 
the entire dataset. It should be noted that AP cannot directly 
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Fig. 7. Sample images from 15 categories of the Corel dataset. 

address multi-class problems and a time-consuming heuristic 
approach discussed in 1 13 | has to be adopted. We also report 
the baseline results of normalized cuts (NCuts) 1 10 |, which is 
effectively a spectral clustering algorithm but without using 
pairwise constraints. 

We evaluate the clustering results with the adjusted Rand 
(AR) index (3211 , I33L which has been widely used for the 
evaluation of clustering algorithms. The AR index measures 
the pairwise agreement between the obtained clustering by an 
algorithm and the ground truth clustering, and takes a value 
in the range [-1,1]. A higher AR index indicates that a higher 
percentage of data pairs in the obtained clustering have the 
same relationship (musk-link or cannot-link) as in the ground 
truth clustering. In the following, each experiment is randomly 
run 25 times, and the average AR index is obtained as the final 
clustering evaluation measure. 

As we have mentioned, we construct a /c-NN graph for our 
E^CP algorithm on each dataset. To ensure a fair compari- 
son, the same /c-NN graph is used by all the other spectral 
clustering algorithms on the same dataset. For image and UCI 
datasets, the weight matrices of /c-NN graphs are defined with 
spatial Markov kernels lfT9ll and Gaussian kernels, respectively. 
That is, the spatial Markov kernels are computed on the image 
datasets to exploit the spatial information [19L while the 
Gaussian kernels are used for the UCI datasets as in |[T3l . 

2) Results on Image Datasets: We select two different 
image datasets. The first one contains 8 scene categories from 
MIT ll34ll , including four man-made scenes and four natural 
scenes. The total number of images is 2,688. The size of each 
image in this Scene dataset is 256 x 256 pixels. The second 
dataset contains images from a Corel collection. We select 15 
categories (see Fig. [7]), and each of the categories contains 
100 images. In total, this dataset has 1,500 images. The size 
of each image in this dataset is 256 x 384 pixels. 

For the above two image datasets, we choose different 
feature sets which are introduced in |35 | and [19], respectively. 
That is, as in ||35]| , the SIFT descriptors are used for the 
Scene dataset, while, similar to |19|, the joint color and Gabor 
features are used for the Corel dataset. These features are 
chosen to ensure a fair comparison with the state-of-the-art 
techniques. More concretely, for the Scene dataset, we extract 
SIFT descriptors of 16 x 16 pixel blocks computed over a 
regular grid with spacing of 8 pixels. As for the Corel dataset, 
we divide each image into blocks of 16 x 16 pixels and then 
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Illustration of the effect of different parameters on our E^CP algorithm with 2,400 initial pairwise constraints for the two image datasets. 
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Fig. 9. Comparison between different pairwise constraint propagation approaches in single or multiple directions on the two image datasets. 



extract a joint color/texture feature vector from each block. 
Here, the texture features are represented as the means and 
standard deviations of the coefficients of Gabor filters (with 
3 scales and 4 orientations), and the color features are the 
mean values of HSV color components. Finally, for each 
image dataset, we perform /c-means clustering on the extracted 
feature vectors to form a vocabulary of 400 visual keywords, 
and then define a spatial Markov kernel [19| as the weight 
matrix for graph construction. 

In the experiments, we randomly generate the initial pair- 
wise constraints using the ground-truth cluster labels. More- 
over, we provide our E^CP algorithm with a varying number 
of initial pairwise constraints. In the following, we consider 
that the number of initial pairwise constraints ranges from 
300 to 2,400. It should be noted that the initial pairwise 
constraints used here are actually very sparse. For example, 
the most pairwise constraints (i.e. 2,400) can be generated 
with only 2.6% (i.e. about 70) of the images in the Scene 
dataset. Here, images from the same cluster form the must- 
link constraints while images from different clusters form the 
cannot- link constraints. When such few labeled images are 
initially provided, it is not feasible to select the parameters 
by cross-validation for our E^CP algorithm. Hence, we set 
a = 0.6 and k = 20 empirically (similar to the parameter 
selection for many semi- supervised learning algorithms). In 
fact, as shown in Fig. [8l our E^CP algorithm is not sensitive 
to these two parameters, and we select a relatively smaller 
value for k to ensure its efficient running. 

Considering that our E^CP algorithm has two key steps: 
vertical constraint propagation and horizontal constraint prop- 
agation, we need to demonstrate the importance of the hor- 
izontal constraint propagation as a supplement to the verti- 



cal constraint propagation. Hence, we compare the follow- 
ing three constraint propagation approaches: only vertical 
propagation (VP), both vertical and horizontal propagation 
(VP+HP), and VP+HP followed with vertical propagation 
again (VP+HP+VP). The results are shown in Fig. [51 The 
immediate observation is that the horizontal propagation is cru- 
cial for our E^CP algorithm (see VP vs. VP+HP). Moreover, 
we also find that the vertical propagation does not need to be 
performed more than one time, since extra propagation leads 
to very minor improvements (see VP+HP vs. VP+HP+VP). 

It should be noted that the propagated pairwise constraints 
by our E^CP algorithm can not be directly used for spectral 
clustering and they need to be first exploited for similarity ad- 
justment according to equation ([T3l) . The effectiveness of this 
similarity adjustment approach has been preliminarily verified 
by Proposition [3l The further verification is shown in Fig. [Tol 
where New Weight 1 means Wij = 1 + /*,, NewWeight2 means 
Wij = (1 + f^j)wij, and NewWeight3 means equation ([T3l) . 
Here, Ne w Weight 1 defines the new weight matrix only based 
on the output of our E^CP algorithm (i.e. the similarity 
adjustment step is actually ignored), while both NewWeight2 
and NewWeight3 additionally consider the original weight 
matrix. The significant gains achieved by NewWeight3 over 
New Weight 1 show that the similarity adjustment step ac- 
cording to equation ([T3]) is crucial for the success of the 
overall method. Moreover, as compared with NewWeight2, 
our approach is shown to perform much better, which also 
provides further support for Proposition [3l 

Finally, we compare our E^CP algorithm with other closely 
related methods on the two image datasets. To verify that the 
approximate solution obtained by our E^CP is comparable to 
that of the Lyapunov equation ([TOb . we also make comparison 
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Fig. 10. Comparison between different approaches to computing the new weight matrix for constrained spectral clustering on the two image datasets. 
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Fig. 11. The clustering results on the two image datasets by different algorithms when a varying number of pairwise constraints are initially provided. 



TABLE I 

The running time taken by different algorithms on the Scene 
dataset when 2400 pairwise constraints are initially provided. 



TABLE II 

Four UCI datasets used in the experiments. The features are 
first normalized to the range [-1, 1] for all the datasets. 



Algorithms 


E^CP 


LYAP 


AP 


SL 


SSKK 


NCuts 


Time (sec.) 


12 


272 


21 


9 


35 


6 



with the constraint propagation approach by directly solving 
the Lyapunov equation with /i = a/(l — a) and L = I — L 
(denoted as LYAP). The overall clustering results are shown 
in Fig. [TTJ and the running time taken on the Scene dataset is 
also listed in Table [H Here, we run all the clustering algorithms 
(Matlab code) on a PC with 4GB RAM and two 2.66 GHz 
CPUs. The immediate observation is that the performance of 
our E^CP algorithm is comparable (even slightly better) to that 
of LYAP. Considering that our E^CP algorithm incurs much 
less time cost, we prefer it to LYAP in practice. 

The further observation on Fig. [TT] shows that our E^CP per- 
forms consistently better than other closely related methods. 
That is, the effectiveness of our exhaustive constraint propa- 
gation approach to exploiting pairwise constraints for spectral 
clustering has been verified by the promising performance of 
our E^CP. In contrast, SL and SSKK perform unsatisfactorily, 
and, in some cases, their performance has been degraded to 
that of NCuts. This may be due to that by merely adjusting 
the similarities only between the constrained images, these 
approaches have not fully utilized the additional supervisory 
or prior information inherent in the constrained images, and 
hence can not discover the complex structures hidden in the 
challenging image datasets. Although AP can also propagate 
pairwise constraints throughout the entire dataset like our 
E^CP, the heuristic approach discussed in |[T3l may not ad- 



Datasets 


Wine 


Ionosphere 


Soybean 


WDBC 


# samples 


178 


351 


47 


569 


# features 


13 


34 


35 


30 


# clusters 


3 


2 


4 


2 



dress multi-class problems for the challenging image datasets, 
which thus leads to unsatisfactory results. Moreover, another 
important observation is that the performance improvement 
by our E^CP with respect to NCuts becomes more obvious 
when more pairwise constraints are provided, while this is 
not the case for AP, SL or SSKK. In other words, the 
pairwise constraints has been exploited more exhaustively and 
effectively by our pairwise constraint propagation. 

Besides the above advantages over other closely related 
methods, our E^CP has another advantage in terms of time 
cost. That is, as shown in Table H the running time of our 
E^CP is comparable to that of the constrained spectral cluster- 
ing algorithm without using constraint propagation (i.e. SL). 
Moreover, as for the two constraint propagation approaches 
(i.e. E^CP and AP), our E^CP runs faster than AP, particularly 
for multi-class problems. 

3) Results on UCI Datasets: We further conduct experi- 
ments on four UCI datasets, which are described in Table [III 
The UCI datasets have been widely used to evaluate clustering 
and classification algorithms in machine learning. Here, as 
in 1 13 1, the Gaussian kernel is defined on each UCI dataset 
for computing the weight matrix during graph construction. 
The experimental setup and parameter selection on the UCI 
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datasets are similar to those for the image datasets. Since we 
have verified the effectiveness of each important component 
of our E^CP algorithm on the image datasets, we only make 
comparison with other closely related methods on the UCI 
datasets. The clustering results are shown in Fig. [121 

Again, we can find that our constraint propagation approach 
(i.e. E^CP) achieves improved performance in most cases. 
Moreover, the other three constrained clustering approaches 
(i.e. AP, SL, and SSKK) are shown to have generally benefited 
from the pairwise constraints as compared to NCuts. This 
observation is different from that on the image datasets. As 
we have mentioned, this may be due to that, considering the 
complexity of the image datasets, a more exhaustive propa- 
gation (like our E^CP) of the pairwise constraints is needed 
in order to fully utilize the inherent supervisory information 
provided by the pairwise constraints. Our experimental results 
have also demonstrated that an exhaustive propagation of the 
pairwise constraints in the UCI datasets through our E^CP 
leads to improved clustering performance over the other three 
constrained clustering approaches (i.e. AP, SL, and SSKK). 

B. Cross-Modal Multimedia Retrieval 

In this subsection, our multi-source constraint propagation 
(MSCP) algorithm is evaluated in the challenging application 
of cross-modal multimedia retrieval. We focus on comparing 
our MSCP algorithm with the state-of-the-art approach ll22ll . 
since they both consider not only correlation analysis but also 
semantic abstraction for text and image modalities. To verify 
that the approximate solution obtained by our MSCP is com- 
parable to that of the Sylvester equation (fTTl) , we also make 
comparison with the approach by directly solving the Sylvester 
equation with jix = <^a'/(1 — (^x), fJ^y = (^y/{^ — (^y)^ 
La- = I - La', and L3; = I - £3; (denoted as SYLV). 



1) Experimental Setup: We conduct experiments on a 
cross-modal retrieval benchmark dataset [22J, which contains 
a total of 2,866 documents. Each document is actually a text- 
image pair, annotated with a label from the vocabulary of 10 
semantic classes. Here, it should be noted that these documents 
are derived from Wikipedia's "featured articles". The original 
featured articles are categorized by Wikipedia's editors into 
29 categories, and these category labels are assigned to both 
the text and image components of each article. Since some of 
the categories are very scarce, we only consider the 10 most 
populated ones, just as | l22l . 

In this dataset, the text representation for each document 
is derived from a latent Dirichlet allocation (LDA) model 
with 10 latent topics, while the image representation is based 
on a bag-of- words model with 128 visual words learnt from 
the extracted SIFT descriptors, which is exactly the same as 
[22] . Moreover, following the strategy of [22J, the normalized 
correlation measure is used to define the similarity matrix for 
both text and image representation. 

The benchmark dataset [i22J is split into a training set of 
2,173 documents and a test set of 693 documents. The initial 
pairwise constraints for our MSCP algorithm are derived from 
the class labels of the training documents. The performance 
of our MSCP algorithm is evaluated on the test set. Here, 
two tasks are considered: text retrieval using an image query, 
and image retrieval using a text query. In the following, these 
two tasks are denoted as "Image Query" and "Text Query", 
respectively. For each task, the retrieval results are measured 
with mean average precision (MAP) which has been widely 
used in the image retrieval literature ll36ll . 

Let X denote the text source and y denote the image 
source. For our MSCP algorithm, we construct /c-NN graphs 
on X and 3^ with the same k. As we have mentioned, in 
the application of constrained spectral clustering on image 
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Fig. 13. The cross-modal retrieval results by fivefold cross-validation on the training set for our MSCP algorithm. 



TABLE III 

The cross-modal retrieval results on the test set measured by 
map scores. 



Methods 


CA |22'| 


SA 122 1 


CA-hSA |22| 


SYLV 


MSCP 


Image Query 


0.249 


0.225 


0.277 


0.329 


0.329 


Text Query 


0.196 


0.223 


0.226 


0.256 


0.256 



datasets, at most 70 labeled images are initially provided for 
our E^CP algorithm and thus we can not select the parameters 
by cross-validation on such a small training set. However, for 
the cross-modal retrieval benchmark dataset 122], we have 
a much larger training set (of the size 2,173) and all the 
parameters (i.e. a^, (^y, and k) can be selected by fivefold 
cross-validation for our MSCP algorithm. More concretely, 
according to Fig. [131 we set the three parameters as: a^' =0.1, 
ay = 0.025, and k = 90. We can also find that our MSCP 
algorithm is not sensitive to these parameters. 

2) Cross-Modal Retrieval Results: As reported in f22], both 
correlation analysis (CA) and semantic abstraction (SA) play 
an important role in cross-modal retrieval (also see Table Ullb. 
However, these two steps are completely separate in this 
modeling, and the use of semantic abstraction after correlation 
analysis (i.e. CA+SA) seems rather ad hoc. 

In fact, as we have mentioned in Section IIII-CI these 
two steps can be seamlessly integrated in a unified multi- 
source constraint propagation framework. That is, the semantic 
information (e.g. class labels) associated with images and text 
can be used to define the initial must-link and cannot-link 
constraints based on the training set, while the correlation 
between text and image modalities can be explicitly learnt by 
our MSCP algorithm proposed in Section IIII-BI The cross- 
modal retrieval results listed in Table [IIIl have shown the 
effectiveness of such integration as compared to the state-of- 
the-art approach (CA+SA) [22J. This means that the initial 
supervisory information provided for cross-modal retrieval can 
be more exhaustively utilized by our MSCP algorithm through 
pairwise constraint propagation across text and image modal- 
ities, which is similar to the observations in the application of 
constrained spectral clustering. 

Moreover, we can also find that the performance of our 
MSCP algorithm is as good as that of SYLV. Considering that 
our MSCP algorithm incurs much less time cost (34 seconds 
vs. 385 seconds), we prefer it to SYLV in practice. In fact, this 
conclusion is the same as that made for our E^CP algorithm 
as compared to LYAP on single- source data. 



V. Conclusions 

We have investigated the pairwise constraint propagation 
problem in a semi- supervised learning perspective. By de- 
composing this challenging problem into a set of independent 
semi- supervised learning subproblems, we have successfully 
formulated it as minimizing a regularized energy functional. 
More importantly, these semi- supervised learning subproblems 
can be solved efficiently and in parallel using the label 
propagation technique based on k-nearest neighbor graphs. 
The resulting exhaustive set of propagated pairwise constraints 
are exploited for similarity adjustment in the application of 
constrained spectral clustering. It is worth noting that we have 
first clearly shown how pairwise constraints are propagated 
throughout the entire dataset. The proposed approach on 
single- source data is further extended to more challenging 
constraint propagation on multi- source data with an important 
application to cross-modal multimedia retrieval. Extensive 
results have shown that our exhaustive and efficient constraint 
propagation approach can achieve superior performance in 
both constrained spectral clustering and cross-modal retrieval. 
For future work, our approach will also be used to improve 
the performance of other graph-based methods by exhaustively 
exploiting the pairwise constraints. 
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